This paper describes a set of comparative experiments for the problem ofautomatically filtering unwanted electronic mail messages. Several variants ofthe AdaBoost algorithm with confidence-rated predictions [Schapire & Singer,99] have been applied, which differ in the complexity of the base learnersconsidered. Two main conclusions can be drawn from our experiments: a) Theboosting-based methods clearly outperform the baseline learning algorithms(Naive Bayes and Induction of Decision Trees) on the PU1 corpus, achieving veryhigh levels of the F1 measure; b) Increasing the complexity of the baselearners allows to obtain better ``high-precision'' classifiers, which is avery important issue when misclassification costs are considered.
展开▼